1 Method

gDNA reads were aligned onto Potra v2.2 and T89 v2.0 by using bwa. Duplicated reads were marked using Picard. Transcript reads were aligned onto Potra v2.2 using STAR. The visualization was done in IGV.

2 Regions matched to guide RNA using Potra v2.2 as reference

We would like to show the alignment of genome and transcriptome in the area where guide RNA matched. We will look at the result using Potra v2.2 as reference first because we have gene model annotation for it

2.1 Overview of the guide RNA locations and the target gene

2.1.1 Target gene of line 3 Potra2n18c32411

Here we can see the location of Line3_CRISPR_4108_gRNA1 and Line3_CRISPR_4108_gRNA2 matching on the first exon and the forth exon of the target gene, respectively. We can see a deletion region (chr18:3,718,735-3,719,224) downstream to Line3_CRISPR_4108_gRNA2 site in line 26 genome track. The target gene is disrupted in line 3 transcriptome tracks. The forth exon is skipped in T89 and line 26 transcriptome tracks.

2.1.2 Target gene of line 26 Potra2n6c13821

Here we can see the location of Line26_CRISPR_4110_gRNA1 and Line26_CRISPR_4110_gRNA2 matching on the first exon and the third intron of the target gene, respectively. All splice junctions are unaltered from the transcriptome tracks.

2.2 Close-ups on the guide RNA sites

2.2.1 Line3_CRISPR_4108_gRNA1 targeting Line 3

Sequence: GACTAGACGTACAATGGGTT

reverse complement: AACCCATTGTACGTCTAGTC

Here we can see a single base (T) insertion within the gRNA site in line 26 genome track. This insertion is not present in line 3 genome track. The location of the insertion is 3bp away from the PAM sequence so we are quite confident that this insertion is the result of CRISPR-Cas editing. The insertion presents in approximately half of the reads. Please note that there is also a shift of the splice donor site of the first intron 5bp upstream which is likely the result of the single base insertion. The shift in splice site can be seen in line 3 transcriptome tracks. Please note that the last base of the gRNA furthest away from the PAM is mismatched.

The RNA reads in line 3 transcriptome tracks that have the shift of splice donor site of the fist intron also have shift in the splice acceptor site. They skip exon 2-3 which were expressed in control, and express exon 4 which was skipped in control. But with the new splice donor, splice acceptor of exon 4 is shift upstream by 52bp from the annotation.

2.2.2 Line3_CRISPR_4108_gRNA2 targeting Line 3

Sequence: GCACATAAGCAGATACGCTC

reverse complement: GAGCGTATCTGCTTATGTGC

Here we can see a single base (T) insertions within the gRNA site in line 26 genome track and line 26 transcriptome tracks. The variation in insertion visualization between genome and transcriptome is due to the variation in alignment. The actual sequence at CRISPR cut site is ATG T TGC where bold character is the inserted base. The gRNA site is within skipped exon 4 in T89 and line 26 transctiptome so we do not have data on those tracks. This single base insertion is not present in line 3 genome track so, it is likely to be the result of CRISPR-Cas editing. The location of the insertions is 3bp away from the PAM sequence. Please note that there are also reads that do not have any insertions.

Here we can see a drop in coverage in line 26 genome track starting from within gRNA site, 2bp away from the PAM sequence, and continue downstream, which is likely a deletion. This deletion is 490 bp long and present only in approximately half of the reads.

2.2.3 Line26_CRISPR_4110_gRNA1 targeting Line 26

Sequence: GGTCATAATACGCTGGACTT

reverse complement: AAGTCCAGCGTATTATGACC

Here we can see single base insertions within the gRNA site in line 3 genome track and line 26 transcriptome tracks. These insertions are not present in line 26 genome track, line 3 or T89 transcriptome tracks. Therefore, they are likely to be the result of CRISPR-Cas editing. The location of the insertions is 3bp away from the PAM sequence. The variation in insertions visualization is due to the variation in alignment. The actual sequence at CRISPR cut site are AAG G TCC or AAG T TCC where bold characters are the inserted bases. G and T insertions are mutually exclusive. Please note that there are also reads that do not have any insertions.

2.2.4 Line26_CRISPR_4110_gRNA2 targeting Line 26

Sequence: TATAAAGAGCAAGAATTGAC

reverse complement: GTCAATTCTTGCTCTTTATA

Here we can not see any difference between line 3 and line 26 genome track. The gRNA site is within intron so we do not have data on transcriptome track. We found A -> G mutation, a single base (T) insertion and a single base deletion with in the gRNA site in both line 26 and line 3 genome tracks. The reads that have A -> G mutation and single base deletion are mutually exclusive from the reads that have single base insertion, which lead us to conclude that they are from different P. tremula and P. tremuloides haplotypes.

The Potra v2.2 reference (…GCTCTTATA) has mismatch with the gRNA (…GCTCTTTATA) but the single T insertion correct this mismatch. Theoretically the gRNA can then be haplotype-specific. However, we noticed that the PAM sequence is 2bp away from gRNA site in the reads that contain the single T insertion. This might be the reason why we cannot see any difference between line 3 and line 26.

2.3 Summary from the alignment using Potra v2.2 as reference

  • There is very likely to be sample swap between line 26 and line 3 genomic DNA.

  • Guide RNAs targeting line 3 result in shift in splice donor site of the first intron, skipping of the second and third exons, expression of the forth exon with shifted splice acceptor site, and possibly a 490 bp deletion within the fourth intron of the target gene.

  • Guide RNA 1 targeting line 26 results in single base insertions in the first exon.

  • Guide RNA 2 targeting line 26 is likely to fail on editing because of the 2bp distance from PAM.

3 Regions matched to guide RNA using T89 as reference

There are two haplotypes in T89 references namely primary and alternative. It is very likely that one of that is P. tremula and another is P. tremuloides. I align the target gene from Potra v2.2 onto both haplotypes and focus on gRNA site that are within the target gene regions.

3.1 Close-ups on the guide RNA sites

3.1.1 Line3_CRISPR_4108_gRNA1 targeting Line 3

Sequence: GACTAGACGTACAATGGGTT

reverse complement: AACCCATTGTACGTCTAGTC

There is no insertion at the gRNA site within target gene location on utg000005l of primary haplotype.

However, there is single base (T) insertion in line 26 genome track at two identical sites 8.5kb and 19kb upstream to the target gene on the same contig utg000005l.

In utg000256l-alternative-haplotype, there is a 37bp deletion covering gRNA site and the PAM sequence in line 26 genome track.

  • We have checked transcriptomic reads mapped without annotation onto T89 haplotypes. We found that the gRNA1 sites 8.5kb and 19kb upstream to the target gene in primary haplotype only contain the first three exons per site. Therefore, they do not have similar sites for gRNA2. The shift in splice donor of the first intron 5bp upstream can be seen in a portion of reads at the target gene and at the site 19kb upstream in Line3_2 and Line3_4. If we consider all three sites together, we can see single T insertion and the shift of spice donor like in Potra v2.2. Please note that we can not find the 37bp deletion on alternative haplotype in the transcriptome. All transcriptomic reads mapped on alternative haplotype are similar to the primary haplotype with mixture of normal and shifted splice donor.

3.1.2 Line3_CRISPR_4108_gRNA2 targeting Line 3

Sequence: GCACATAAGCAGATACGCTC

reverse complement: GAGCGTATCTGCTTATGTGC

We found the single base (T) insertion in line 26 genome track within gRNA site in both primary (utg000005l) and alternative (utg000256l-alternative-haplotype) contigs. The picture below is from utg000005l.

And the picture below is from utg000256l-alternative-haplotype.

Next, we want to investigate the 490bp deletion that we saw using Potra v2.2 as reference. However, the coverage using T89 as refenence is lower. Detection of the large deletion can be difficult in lower coverage samples but we see roughly the boundary of the same deletion in line 26 genome track in both haplotypes. Both are 490bp long. The picture below is from utg000005l.

And the picture below is from utg000256l-alternative-haplotype.

3.1.3 Line26_CRISPR_4110_gRNA1 targeting Line 26

Sequence: GGTCATAATACGCTGGACTT

reverse complement: AAGTCCAGCGTATTATGACC

We found the single base (G or T) insertions within gRNA site in line 3 genome track on both primary (utg000014l) and secondary (utg000037l-alternative-haplotype) contigs. The picture below is from utg000014l.

And the picture below is from utg000037l-alternative-haplotype.

3.1.4 Line26_CRISPR_4110_gRNA2 targeting Line 26

Sequence: TATAAAGAGCAAGAATTGAC

reverse complement: GTCAATTCTTGCTCTTTATA

Here we plot only on alternative haplotype (utg000037l-alternative-haplotype) because it matches to the gRNA sequence while primary haplotype do not. Here we can not see single T insertion anymore because the reference sequence is similar to gRNA. However, the 2bp distance from PAM sequence which is downstream to gRNA site prevent CRISPR-Cas from editing this location.

3.2 Summary from the alignment using T89 as reference

  • The result still suggests sample swap between line 26 and line 3 genomic DNA.

  • Guide RNA 1 targeting line 3 results in single base insertion and likely splice donor shift. The exact location of this editing is still ambiguous because of the present of target gene fragments in T89 genome.

  • Guide RNA 2 targeting line 3 results in single base insertion. We still see the possibly 490 bp deletion following gRNA2 site.

  • Guide RNA 1 targeting line 26 results in single base insertions.

  • Guide RNA 2 targeting line 26 is likely to fail on editing because of the 2bp distance from PAM.

 

drawing

Created by Fai Theerarat Kochakarn

theerarat.kochakarn@umu.se